Compression circuitry for generating an encoded bitstream from a plurality of video frames

ABSTRACT

Data is discrete cosine transformed and streamed to a processor where quantized and inverse quantized blocks are generated. A second streaming data connection streams the inverse quantized blocks to an inverse discrete cosine transform block to generate reconstructed prediction error macroblocks. An addition circuit adds each reconstructed prediction error macroblock and its corresponding predictor macroblock to generate a respective reconstructed macroblock. The quantized macroblocks are zig-zag scanned, run level coded and variable length coded to generate and encoded bitstream.

CROSS-REFERENCE

This application is a divisional of U.S. Application for patent Ser. No.10/391,442, filed Mar. 17, 2003, which claims priority from EuropeanApplication for Patent No. 02251932.6 filed on Mar. 18, 2002, thedisclosures of which are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

1. Technical Field of the Invention

The present invention relates to motion picture compression circuits forpictures such as television pictures, and more particularly to acompression circuit complying with H.261 and MPEG standards.

2. Description of Related Art

FIGS. 1A-1C schematically illustrate three methods for compressingmotion pictures in accordance with H.261 and MPEG standards. Accordingto H.261 standards, pictures may be of intra or predicted type.According to MPEG standards, the pictures can also be of bidirectionaltype.

Intra (“I”) pictures are not coded with reference to any other pictures.Predicted (“P”) pictures are coded with reference to a past intra orpast predicted picture. Bidirectional (“B”) pictures are coded withreference to both a past picture and a following picture.

FIG. 1A illustrates the compression of an intra picture I1. Picture I1is stored in a memory area M1 before being processed. The pictures haveto be initially stored in a memory since they arrive line by linewhereas they are processed square by square, the size of each squarebeing generally 16 by 16 pixels. Thus, before starting to processpicture I1, memory area M1 must be filled with at least 16 lines.

The pixels of a 16 by 16-pixel square are arranged in a so-called“macroblock”. A macroblock includes four 8 by 8-pixel luminance blocksand two or four 8 by 8-pixel chrominance blocks. The processeshereinafter described are carried out by blocks of 8 by 8 pixels.

The blocks of each macroblock of picture 11 are submitted at 10 to adiscrete cosine transform (DCT) followed at 11 by a quantization (Q). ADCT transforms a matrix of pixels (a block) into a matrix whose upperleft corner coefficient tends to have a relatively high value. The othercoefficients rapidly decrease as the position moves downwards to theright. Quantization involves dividing the coefficients of the matrix sotransformed, such that a large number of coefficients which are adistance away from the upper left corner are cancelled.

At 12, the quantified matrices are subject to zigzag scanning (ZZ) andto run/level coding (RLC). Zigzag scanning has the consequence ofimproving the chances of consecutive series of zero coefficients, eachof which is preceded by a non-zero coefficient. The run/level codingmainly includes replacing each series from the ZZ scanning with a pairof values, one representing the number of successive zero coefficientsand the other representing the first following non-zero coefficient.

At 13, the pairs of values from the RLC are subject to variable lengthcoding (VLC) that includes replacing the more frequent pairs with shortcodes and replacing the less frequent pairs with long codes, with theaid of correspondence tables defined by the H.261 and MPEG standards.The quantification coefficients can be varied from one block to the nextby multiplication by a quantization coefficient. That quantizationcoefficient is inserted during variable length coding in headerspreceding the compressed data corresponding to macroblocks.

Macroblocks of an intra picture are used to compress macroblocks of asubsequent picture of predicted or bidirectional type. Thus, decoding ofa predicted or bidirectional picture is likely to be achieved from apreviously decoded intra picture. This previously decoded intra picturedoes not exactly correspond to the actual picture initially received bythe compression circuit, since this initial picture is altered by thequantification at 11. Thus, the compression of a predicted or intrapicture is carried out from a reconstructed intra picture I1 rather thanfrom the real intra picture I1, so that decoding is carried out underthe same conditions as encoding.

The reconstructed intra picture I1 r is stored in a memory area M2 andis obtained by subjecting the macroblocks provided by the quantification11 to a reverse processing, that is, at 15 an inverse quantification(Q⁻¹) followed at 16 by an inverse DCT (DCT⁻¹).

FIG. 1B illustrates the compression of a predicted picture P4. Thepredicted picture P4 is stored in a memory area M1. A previouslyprocessed intra picture I1 r has been reconstructed in a memory area M2.

The processing of the macroblocks of the predicted picture P4 is carriedout from so-called predictor macroblocks of the reconstructed picture I1r. Each macroblock of picture P4 (reference macroblock) is subject tomotion estimation (ME) at 17 (generally, the motion estimation iscarried out only with the four luminance blocks of the referencemacroblocks).

This motion estimation includes searching in a window of picture I1 rfor a macroblock that is nearest, or most similar to the referencemacroblock. The nearest macroblock found in the window is the predictormacroblock. Its position is determined by a motion vector V provided bythe motion estimation. The predictor macroblock is subtracted at 18 fromthe current reference macroblock. The resulting difference macroblock issubjected to the process described with relation to FIG. 1A.

Like the intra pictures, the predicted pictures serve to compress otherpredicted pictures and bidirectional pictures. For this purpose, thepredicted picture P4 is reconstructed (P4 r) in a memory area M3 by aninverse quantification at 15, inverse DCT at 19, and addition at 19 ofthe predictor macroblock that was subtracted at 18.

The vector V provided by the motion estimation 17 is inserted in aheader preceding the data provided by the variable length coding of thecurrently processed macroblock.

FIG. 1C illustrates the compression of a bidirectional picture B2.Bidirectional pictures are provided for in MPEG standards only. Theprocessing of the bidirectional pictures differs from the processing ofpredicted pictures in that the motion estimation 17 consists in findingtwo predictor macroblocks in two pictures I1 r and P4 r, respectively,that were previously reconstructed in memory areas M2 and M3. Generally,pictures I1 r and P4 r respectively correspond to a picture precedingthe bidirectional picture that is currently processed and to a picturefollowing the bidirectional picture.

At 20, the mean value of the two obtained predictor macroblocks iscalculated and is subtracted at 18 from the currently processedmacroblock.

The bidirectional picture is not reconstructed because it is not used tocompress another picture.

The motion estimation 17 provides two vectors V1 and V2 indicating therespective positions of the two predictor macroblocks in pictures I1 rand P4 r with respect to the reference macroblock of the bidirectionalpicture. Vectors V1 and V2 are inserted in a header preceding the dataprovided by the variable length coding of the currently processedmacroblock.

In a predicted picture, an attempt is made to find a predictormacroblock for each reference macroblock. However, in some cases, usingthe predictor macroblock that is found may provide a smaller compressionrate than that obtained by using an unmoved predictor macroblock (zeromotion vector), or even smaller than the simple intra processing of thereference macroblock. Thus, depending upon these cases, the referencemacroblock is submitted to either predicted processing with the vectorthat is found, predicted processing with a zero vector, or intraprocessing.

In a bidirectional picture, an attempt is made to find two predictormacroblocks for each reference macroblock. For each of the two predictormacroblocks, the process providing the best compression rate isdetermined, as indicated above with respect to a predicted picture.Thus, depending on the result, the reference macroblock is submitted toeither bidirectional processing with the two vectors, predictedprocessing with only one of the vectors, or intra processing.

Thus, a predicted picture and a bidirectional picture may containmacroblocks of different types. The type of a macroblock is also datainserted in a header during variable length coding. According to MPEGstandards, the motion vectors can be defined with an accuracy of half apixel. To search a predictor macroblock with a non integer vector, firstthe predictor macroblock determined by the integer part of this vectoris fetched, then this macroblock is submitted to so-called “half-pixelfiltering”, which includes averaging the macroblock and the samemacroblock shifted down and/or to the right by one pixel, depending onthe integer or non-integer values of the two components of the vector.According to H.261 standards, the predictor macroblocks may be subjectedto low-pass filtering. For this purpose, information is provided withthe vector, indicating whether filtering has to be carried out or not.

The succession of types (intra, predicted, bidirectional) is assigned tothe pictures in a predetermined way, in a so-called group of pictures(GOP). A GOP generally begins with an intra picture. It is usual, in aGOP, to have a periodical series, starting from the second picture,including several successive bidirectional pictures, followed by apredicted picture, for example of the form IBBPBBPBB . . . where I is anintra picture, B a bidirectional picture, and P a predicted picture. Theprocessing of each bidirectional picture B is carried out frommacroblocks of the previous intra or predicted picture and frommacroblocks of the next predicted picture.

The various functional blocks that are used in a typical prior artfunctional implementation are shown in FIG. 2. For clarity, the motionestimation engine and memory for storing macroblocks and video pictureshave been omitted.

In FIG. 2, a reference macroblock is supplied to a subtraction circuit,where the predictor for that macroblock is subtracted (in the case of Band P pictures, only). The resultant error block (or the originalmacroblock, for I pictures) is passed on to a DCT block, then to aquantization block for quantization.

The quantized macroblock is forwarded to an encoding process and aninverse quantization block. The encoding process takes the quantizedmacroblock and zig-zag encodes it, performs run level coding on theresultant data, then variable length packs the result, outputting thenow encoded bitstream.

The bitstream is monitored and can be controlled via feedback to a ratecontrol system. This controls quantization (and dequantization) to meetcertain objectives for bitstream. A typical objective is a maximumbit-rate, although other factors can also be used.

The inverse quantization block in FIG. 2 is the start of areconstruction chain that is used to generate a reconstructed version ofeach frame, so that the frames the motion prediction engine is searchingfor matching macroblocks are the same as will be regenerated duringdecoding proper. After inverse quantization, the macroblock is inverseDCT transformed in IDCT block and added to the original predictor usedto generate the error macroblock. This reconstructed block is stored inmemory for subsequent use in the motion estimation process.

The various blocks required to generate the encoded output stream havedifferent computational requirements, which themselves can varyaccording to the particular application or user selected restrictions.Throttling of the output bitstream to meet bandwidth requirements istypically handled by manipulating the quantization step.

Pure hardware architectures, while potentially the most efficient,suffer from lack of flexibility since they can support only a restrictedrange of standards; moreover they have long design/verification cycles.On the other hand, pure software solutions, while being the mostflexible, require high-performance processors unsuited to low-costconsumer applications.

It would be desirable to provide an architecture that allowed forrelatively flexible bitstream control while reducing the amount ofsoftware-based processing power required.

SUMMARY

In an embodiment, a decoder circuit comprises: a processor configured toinverse quantize macroblocks to generate inverse quantized macroblocks;an inverse discrete cosine transformation circuit that processes theinverse quantized macroblocks from the processor to generate IDCTtransformed macroblocks; and an addition circuit that adds a single IDCTtransformed macroblock and a corresponding predictor macroblock togenerate a reconstructed picture macroblock.

In an embodiment, a method for decoding an encoded bitstream, comprises:inverse quantizing decoded macroblocks in a processor to generateinverse quantized macroblocks; generating inverse discrete cosinetransformation (IDCT) transformed macroblocks from the inverse quantizedmacroblocks; and adding a single IDCT transformed macroblock and acorresponding predictor macroblock to generate a reconstructed picturemacroblock.

In an embodiment, a video compression circuit comprises: a discretecosine transform (DCT) circuit for accepting prediction errormacroblocks and generating DCT transformed macroblocks; a processorbeing configured to quantize the DCT transformed macroblocks to generatequantized macroblocks, and to inverse quantize the quantized macroblocksto generate inverse quantized macroblocks; an inverse discrete cosinetransform (IDCT) circuit, wherein the IDCT circuit transforms theinverse quantized macroblocks to generate reconstructed prediction errormacroblocks; and an addition circuit for adding a single reconstructedprediction error macroblock and a corresponding predictor macroblock togenerate respective reconstructed macroblocks for use in the encoding ofother macroblocks.

In an embodiment, a method of generating a compressed video bitstreamcomprises: generating DCT transformed macroblocks by applying predictionerror macroblocks to a discrete cosine transform (DCT) circuit;quantizing the DCT transformed macroblocks to generate quantizemacroblocks; inverse quantizing the quantize macroblocks to generateinverse quantize macroblocks; generating reconstructed prediction errormacroblocks by applying the inverse quantize macroblocks to a IDCTcircuit; and adding a single reconstructed prediction error macroblockand a corresponding predictor macroblock to generate respectivereconstructed macroblocks for use in the encoding of other macroblocks.

In an embodiment, an encoder/decoder circuit comprises: a discretecosine transform (DCT) circuit to generate DCT transformed macroblocksfrom prediction error macroblocks; a processor configured to quantizethe DCT transformed macroblocks to generate quantized macroblocks, andto inverse quantize the quantized macroblocks to generate inversequantized macroblocks; an inverse discrete cosine transform (IDCT)circuit to transform the inverse quantized macroblocks to generatereconstructed prediction error macroblocks; an addition circuit to add asingle reconstructed prediction error macroblock and a correspondingpredictor macroblock to generate respective reconstructed macroblocks;and a control circuit to configure the encoder/decoder circuit to encodeor decode a bitstream.

In an embodiment, a method for encoding and decoding in anencoder/decoder circuit having a control circuit to configure theencoder/decoder circuit for encoding or decoding mode comprises:generating DCT transformed macroblocks by applying prediction errormacroblocks to a discrete cosine transform (DCT) circuit; quantizing theDCT transformed macroblocks to generate quantized macroblocks; inversequantizing the quantized macroblocks to generate inverse quantizedmacroblocks; generating reconstructed prediction error macroblocks byapplying the inverse quantized macroblocks to the IDCT circuit; andadding the reconstructed prediction error macroblocks and correspondingpredictor macroblocks to generate respective reconstructed macroblocks;wherein the reconstructed macroblocks are useful either as decodedreconstructed picture macroblocks or for encoding other macroblocks.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the method and apparatus may beacquired by reference to the following Detailed Description when takenin conjunction with the accompanying Drawings wherein:

FIGS. 1A to 1C, previously described, illustrate three picturecompression processes according to H.261 and MPEG standards, inaccordance with the prior art;

FIG. 2, previously described, is a schematic of the functional blocks ina typical MPEG encoding scheme, in accordance with the prior art;

FIG. 3 is a schematic of an encoder loop; and

FIG. 4 is a schematic of compression circuitry for generating an encodedbitstream from a plurality of video frames.

DETAILED DESCRIPTION OF THE DRAWINGS

FIG. 3 shows an overview of the functional blocks of one embodiment, inwhich hardware functionality is represented by rectangular blocks andsoftware functionality is represented by an oval block.

The functional blocks include a subtraction circuit 300 for subtractingeach predictor macroblock, as supplied by the motion estimation engine(described later) from its corresponding picture macroblock, to generatea prediction error macroblock. For an I picture, there is no predictor,so the macroblock is passed through the subtraction circuit with nochange.

The prediction error macroblock is supplied to a DCT circuit 301 where aforward discrete cosine transform (DCT) is performed. Such hardware andits operation are well known in the prior art and will not be describedhere in further detail.

The output of the DCT is streamed to a processor 302 (described later)which performs the quantization, zig-zag coding, a run level codingsteps in the encoding process. The resultant data is variable lengthcoded and output as an encoded bitstream. In the simplified schematic ofFIG. 3, the variable length coding takes place in software. However, inan alternative embodiment described later, the variable length codingand packing, or just packing, is performed in hardware, since thisprovides a drastic increase in performance compared to software codingrunning on a general purpose processor.

The processor 302 also performs inverse quantization (Q⁻¹), and theresultant inverse quantized macroblocks are sent to an inverse DCT(IDCT) circuit 303 via a streaming interface. An inverse DCT (IDCT) isperformed and the resultant reconstructed error macroblock is added tothe original predictor macroblock (for P and B pictures only) by anaddition circuit 304. The predictor macroblocks have been delayed in adelay buffer 305. For I and P pictures, the macroblock is fullyreconstructed after the IDCT circuit. The resultant reconstructedmacroblocks are then stored in memory for use by the motion estimationengine in generating predictors for future macroblocks. This isnecessary because it is reconstructed macroblocks that a decoder willsubsequently use to reconstruct the pictures.

FIG. 4 shows a more detailed version of the embodiment of FIG. 3, andlike features are denoted by corresponding reference numerals. In FIG.4, the motion estimation engine 400 for use with the encoding circuitryis also shown. The motion estimation engine 400 determines the bestmatching macroblock (or average of two macroblocks) for each macroblockin the frame (for B and P pictures only) and subtracts it from themacroblock being considered to generate a predictor error macroblock.The method of selecting predictor macroblocks is not a part of thepresent solution and so is not described in greater detail herein.

The motion estimation engine 400 outputs the macroblocks, associatedpredictor macroblocks and vectors, and other information such as frametype and encoding modes, to DCT/IDCT circuitry via a direct link.Alternatively, this information can be transferred over a data bus. Databus transfer principles are well known and so is not described indetail.

The DCT and IDCT steps are performed in a DCT/IDCT block 401, whichincludes combined DCT/IDCT circuitry 301/303 that is selectable toperform either operation on incoming data. The input is selected by wayof a multiplexer 402, the operation of which will be described ingreater detail below. The output of the multiplexer is supplied to thedelay block 305 and the DCT/IDCT circuitry 301/303. Additional datasupplied by the motion estimation engine 400, such as the motionvector(s), encoding decisions (intra/non-intra, MC/no MC, field/frameprediction, field/frame DCT) is routed past the delay and DCT/IDCTblocks to a first streaming data interface SDI 403.

The outputs of the delay block and the DCT/IDCT circuitry are suppliedto an addition circuit 304, the output of which is sent to memory 450.The output of the DCT/IDCT block 301/303 is also supplied to the firstSDI port 403.

The first SDI port 403 accepts data from the DCT/IDCT block 301/303 andthe multiplexer 402 and converts it into a format suitable for streamingtransmission to a corresponding second streaming SDI port 404. Thestreaming is controlled by a handshake arrangement between therespective SDI ports. The second streaming SDI port 404 takes thestreaming data from the first SDI port 403 and converts it back into aformat suitable for use within the processor 302.

Once the data has been transformed back into a synchronous format, theprocessor performs quantization 405, inverse quantization 406 andzig-zag/run level coding 407 as described previously. It will beappreciated that the particular implementations of these steps insoftware is not relevant, and so is not described in detail.

After inverse quantization, the macroblock is returned to a third SDIport 408, which operates in the same way as the first streaming port toconvert and stream the data to a fourth SDI port 409, which converts thedata for synchronous use and supplies it to the multiplexer 402.

The processor 302 outputs the run level coded data to a fifth SDI port410, which in a similar fashion to the first and third SDI ports,formats the data for streaming transmission to a sixth SDI port 411,which in turn reformats the data into a synchronous format. The data isthen variable length coded and packed in hardware VLC circuitry 412. Theparticular workings of the hardware VLC packing circuitry 412 are wellknown in the art, are not critical to understanding the present solutionand so will not be described in detail. Indeed, as mentioned previously,the VLC operation can be performed in software by the processor, for acorresponding cost in processor cycles.

It will be appreciated that a number of control lines and ancillarydetail has been omitted for clarity. For example, it is clear themultiplexer and DCT/IDCT block 301/303 need to be controlled to ensurethat the correct data is being fed to the DCT/IDCT block and that thecorrect operation is being performed. For example, when the initial DCToperation 301 is being performed, the multiplexer 402 is controlled toprovide data from the bus (supplied by the motion estimation engine) tothe DCT/IDCT block 301/303, which is set to DCT mode. However, whenperforming the IDCT operation 303, the multiplexer 402 sends data fromthe fourth SDI port 409 to the DCT/IDCT block 301/303, which is set toIDCT mode.

Similarly, some support hardware that would exist in the actualimplementation has been omitted. An obvious example is buffers on thevarious inputs and output. It would be usual in such circuitry toinclude FIFO buffers supporting the SDI ports to maximize throughput.For the purposes of clarity, such support hardware is not explicitlyshown. However, it will be understood by those skilled in the art to beimplicitly present in any practical application.

It will be appreciated that, in the encoding mode described above, theDCT and IDCT functions of the DCT/IDCT block 301/303 will be performedin an interleaved manner, with one or more DCT operations beinginterleaved with one or more IDCT operations, depending upon the orderof I, P and B pictures being encoded.

With slight modifications to control software and circuitry, theencoding circuitry described above can perform decoding of an encodedMPEG stream. This is because the inverse quantization software and IDCThardware are common to the encoding and decoding process. There are atleast three ways this can be achieved:

Option 1. If it is only required to offload the IDCT processing from theprocessor, the dequantized coefficient blocks can be streamed from theprocessor to the IDCT/DCT block 301/303 via the third and fourth SDIports 408 and 409. The results of the IDCT are then read back via thefirst and second SDI ports 403 and 404.

Option 2. Option 1 can be extended to allow more of the decoding load tobe passed to the DCT/IDCT block 401. In particular, the predictor blocksare read into the delay buffer 305. The coefficient blocks are then readin via the same route by the DCT/IDCT block 301/303 (in IDCT 30 mode).After the IDCT has taken place, the predictor and IDCT processedmacroblocks are combined by the addition circuitry 304 and written tosystem memory via the system data bus.

Option 3. In an alternative to option 2, the motion estimation block isconfigured to provide the predictor blocks to the delay buffer 305 viathe multiplexer 402. The coefficient blocks are provided to the DCT/IDCTblock 301/303 (in IDCT mode), and the remainder of the procedure is asper the second decoding arrangement.

Although preferred embodiments of the method and apparatus of thepresent invention have been illustrated in the accompanying Drawings anddescribed in the foregoing Detailed Description, it will be understoodthat the invention is not limited to the embodiments disclosed, but iscapable of numerous rearrangements, modifications and substitutionswithout departing from the spirit of the invention as set forth anddefined by the following claims.

1. A decoder circuit, comprising: a processor configured to inversequantize macroblocks to generate inverse quantized macroblocks; aninverse discrete cosine transformation circuit that processes theinverse quantized macroblocks from the processor to generate IDCTtransformed macroblocks; and an addition circuit that adds a single IDCTtransformed macroblock and a corresponding predictor macroblock togenerate a reconstructed picture macroblock.
 2. The decoder circuit ofclaim 1, further comprising a delay buffer for storing the correspondingpredictor macroblocks.
 3. The decoder circuit of claim 2, wherein amotion estimation engine provides the corresponding predictormacroblocks to the delay buffer.
 4. The decoder circuit of claim 3,further comprising a first streaming data connection for streaming theinverse quantized macroblocks from the processor to the IDCT circuit. 5.The decoder circuit of claim 4, wherein the IDCT circuit processes dataat a rate determined by the arrival of data from the relevant dataconnection.
 6. The decoder circuit of claim 5, wherein the IDCT circuitprocesses data at a rate determined by a handshake control signal. 7.The decoder circuit of claim 6, further comprising a macroblock memoryto store the reconstructed picture macroblocks.
 8. A method for decodingan encoded bitstream, comprising: inverse quantizing decoded macroblocksin a processor to generate inverse quantized macroblocks; generatinginverse discrete cosine transformation (IDCT) transformed macroblocksfrom the inverse quantized macroblocks; and adding a single IDCTtransformed macroblock and a corresponding predictor macroblock togenerate a reconstructed picture macroblock.
 9. The method according toclaim 8, further comprising storing the corresponding predictormacroblocks in a delay buffer.
 10. The method according to claim 9,further comprising receiving the corresponding predictor macroblocksfrom a motion estimation engine.
 11. The method according to claim 10,further comprising streaming the inverse quantized macroblocks from theprocessor to the IDCT circuit.
 12. The method according to claim 11,wherein generating the IDCT transformed macroblocks takes place at arate determined by the arrival of data.
 13. The method according toclaim 12, wherein generating the IDCT transformed macroblocks takesplace at a rate determined by a handshake control signal.
 14. The methodaccording to claim 13, further comprising storing the reconstructedpicture macroblocks in a macroblock memory.
 15. A video compressioncircuit, comprising: a discrete cosine transform (DCT) circuit foraccepting prediction error macroblocks and generating DCT transformedmacroblocks; a processor being configured to quantize the DCTtransformed macroblocks to generate quantized macroblocks, and toinverse quantize the quantized macroblocks to generate inverse quantizedmacroblocks; an inverse discrete cosine transform (IDCT) circuit,wherein the IDCT circuit transforms the inverse quantized macroblocks togenerate reconstructed prediction error macroblocks; and an additioncircuit for adding a single reconstructed prediction error macroblockand a corresponding predictor macroblock to generate respectivereconstructed macroblocks for use in the encoding of other macroblocks.16. The compression circuit of claim 15, further comprising means forzig-zag scanning, run level coding and variable length coding thequantized macroblocks to generate an encoded bitstream.
 17. Thecompression circuit of claim 16, wherein the means for zig-zag scanningand run length coding is the processor configured to implement thezig-zag scanning and run length coding, and the means for variablelength coding is a hardware VLC packer.
 18. The compression circuit ofclaim 17, further comprising: a first streaming data connection forstreaming the DCT transformed macroblocks from the DCT transformationcircuit to the processor; a second streaming data connection forstreaming the inverse quantized macroblocks from the processor to theIDCT transformation circuit; and a third streaming data connection forstreaming the run length coded data from the processor to the hardwareVLC packer.
 19. The compression circuit of claim 18, wherein the DCTcircuit, the IDCT circuit, and the hardware VLC packer process data at arate determined by the arrival of data from the relevant dataconnection.
 20. The compression circuit according to claim 19, whereinthe DCT circuit, the IDCT circuit, and the hardware VLC packer processdata at a rate determined by a handshake control signal.
 21. Thecompression circuit according to claim 20, further comprising a motionestimation engine for supplying the prediction error macroblocks to theDCT circuit.
 22. The compression circuit according to claim 21, furthercomprising a macroblock memory for storing the reconstructedmacroblocks.
 23. A method of generating a compressed video bitstream,the method comprising: generating DCT transformed macroblocks byapplying prediction error macroblocks to a discrete cosine transform(DCT) circuit; quantizing the DCT transformed macroblocks to generatequantize macroblocks; inverse quantizing the quantize macroblocks togenerate inverse quantize macroblocks; generating reconstructedprediction error macroblocks by applying the inverse quantizemacroblocks to a IDCT circuit; and adding a single reconstructedprediction error macroblock and a corresponding predictor macroblock togenerate respective reconstructed macroblocks for use in the encoding ofother macroblocks.
 24. The method according to claim 23, furthercomprising generating an encoded bitstream by zig-zag scanning, runlevel coding and variable length coding the quantized macroblocks. 25.The method according to claim 24, wherein generating the encodedbitstream by zig-zag scanning and run length coding the quantizedmacroblocks is performed by the processor configured to implement thezig-zag scanning and run length coding, and by variable length codingthe run length coded macroblocks in a hardware VLC packer.
 26. Themethod according to claim 25, further comprising: streaming the DCTtransformed macroblocks from the DCT transformation circuit to theprocessor; streaming the inverse quantized macroblocks from theprocessor to the IDCT transformation circuit; and streaming the runlength coded data to the hardware VLC packer.
 27. The method accordingto claim 26, wherein generating the DCT transformed macroblocks,generating the reconstructed prediction error macroblocks, andgenerating the encoded bitstream take place at a rate determined by thearrival of data from the relevant data connection.
 28. The methodaccording to claim 27, wherein generating the DCT transformedmacroblocks, generating the reconstructed prediction error macroblocks,and generating the encoded bitstream takes place at a rate determined bya handshake control signal.
 29. The method according to claim 28,further comprising receiving the prediction error macroblocks from amotion estimation engine.
 30. The method according to claim 29, furthercomprising storing the reconstructed macroblocks in a macroblock memory.31. An encoder/decoder circuit, comprising: a discrete cosine transform(DCT) circuit to generate DCT transformed macroblocks from predictionerror macroblocks; a processor configured to quantize the DCTtransformed macroblocks to generate quantized macroblocks, and toinverse quantize the quantized macroblocks to generate inverse quantizedmacroblocks; an inverse discrete cosine transform (IDCT) circuit totransform the inverse quantized macroblocks to generate reconstructedprediction error macroblocks; an addition circuit to add a singlereconstructed prediction error macroblock and a corresponding predictormacroblock to generate respective reconstructed macroblocks; and acontrol circuit to configure the encoder/decoder circuit to encode ordecode a bitstream.
 32. The encoder/decoder of claim 31, wherein theencoder/decoder circuit configured for decoding mode uses the processorconfigured to inverse quantize macroblocks, the IDCT circuit, and theaddition circuit to generate the reconstructed macroblocks.
 33. Theencoder/decoder of claim 32, further comprising means for zig-zagscanning, run level coding and variable length coding the quantizedmacroblocks to generate an encoded bitstream.
 34. The encoder/decoder ofclaim 33, wherein the means for zig-zag scanning and run length codingis the processor configured to implement the zig-zag scanning and runlength coding, and the means for variable length coding is a hardwareVLC packer.
 35. The encoder/decoder of claim 34, further comprising adelay buffer for storing the corresponding predictor macroblocks. 36.The encoder/decoder of claim 35, wherein a motion estimation engineprovides the corresponding predictor macroblocks to the delay buffer.37. The encoder/decoder of claim 36, further comprising: a firststreaming data connection for streaming DCT transformed macroblocks tothe processor; a second streaming data connection for streaming theinverse quantized macroblocks from the processor to the IDCT circuit;and a third streaming data connection for streaming the run length codeddata from the processor to the hardware VLC packer.
 38. Theencoder/decoder of claim 37, wherein the DCT circuit, the IDCT circuit,and the hardware VLC packer process data at a rate determined by thearrival of data from the relevant data connection.
 39. Theencoder/decoder of claim 38, wherein the DCT circuit, the IDCT circuit,and the hardware VLC packer process data at a rate determined by ahandshake control signal.
 40. The encoder/decoder of claim 39, furthercomprising a macroblock memory to store the reconstructed macroblocks.41. A method for encoding and decoding in an encoder/decoder circuithaving a control circuit to configure the encoder/decoder circuit forencoding or decoding mode, comprising: generating DCT transformedmacroblocks by applying prediction error macroblocks to a discretecosine transform (DCT) circuit; quantizing the DCT transformedmacroblocks to generate quantized macroblocks; inverse quantizing thequantized macroblocks to generate inverse quantized macroblocks;generating reconstructed prediction error macroblocks by applying theinverse quantized macroblocks to the IDCT circuit; and adding thereconstructed prediction error macroblocks and corresponding predictormacroblocks to generate respective reconstructed macroblocks; whereinthe reconstructed macroblocks are useful either as decoded reconstructedpicture macroblocks or for encoding other macroblocks.
 42. The methodaccording to claim 41, further comprising generating an encodedbitstream by zig-zag scanning, run level coding and variable lengthcoding the quantized macroblocks.
 43. The method according to claim 42,wherein generating the encoded bitstream by zig-zag scanning and runlength coding the quantized macroblocks is performed by the processorconfigured to implement the zig-zag scanning and run length coding, andby variable length coding the run length coded macroblocks in a hardwareVLC packer.
 44. The method according to claim 43, further comprisingstoring the corresponding predictor macroblocks in a delay buffer. 45.The method according to claim 44, further comprising receiving thecorresponding predictor macroblocks and the prediction error macroblocksfrom a motion estimation engine.
 46. The method according to claim 45,further comprising: streaming the DCT transformed macroblocks to theprocessor; streaming the inverse quantized macroblocks from theprocessor to the IDCT circuit; and streaming the run length coded datafrom the processor to the hardware VLC packer.
 47. The method accordingto claim 46, wherein generating the DCT transformed macroblocks,generating the reconstructed prediction error macroblocks, andgenerating the encoded bitstream take place at a rate determined by thearrival of data from the relevant data connection.
 48. The methodaccording to claim 47, wherein generating the DCT transformedmacroblocks, generating the reconstructed prediction error macroblocks,and generating the encoded bitstream take place at a rate determined bya handshake control signal.